RedshiftのSUPER型でUNNESTと大文字フィールド名を扱ってみる
はじめに
データアナリティクス事業本部のkobayashiです。
RedshiftのSUPER型を使用する案件がありその中でSUPER型のデータを扱う場合にいくつか気になる点があり、ドキュメントを調べながら対応しましたのでその内容をまとめます。
RedshiftのSUPER型
RedshfitのSUPER型は半構造化データをRedshiftへ取り込むことができる型で、主にJSONデータをRedshiftへ格納する際に使います。
SUPER型の説明は弊社ブログにて詳しい記事がありますのでそちらの記事をお読みください。
実際にSUPER型を使っている中でよく使う操作で気になった以下の2点について実際のデータを使いながら試してみます。
- SUPER型データのUNNEST
- 大文字フィールドが含まれているSUPER型データの取り扱い
なお、使用するデータは公式ドキュメントである「半構造化データを Amazon Redshift にロードする - Amazon Redshift 」のnations
テーブルのデータを使います。
"regionkey","name","comment","nations" "0","AFRICA","""lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to ""","[{""n_comment"":"" haggle. carefully final deposits detect slyly agai"",""n_nationkey"":11,""n_name"":""ALGERIA""},{""n_comment"":""ven packages wake quickly. regu"",""n_nationkey"":5,""n_name"":""ETHIOPIA""},{""n_comment"":"" pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t"",""n_nationkey"":14,""n_name"":""KENYA""},{""n_comment"":""rns. blithely bold courts among the closely regular packages use furiously bold platelets?"",""n_nationkey"":15,""n_name"":""MOROCCO""},{""n_comment"":""s. ironic, unusual asymptotes wake blithely r"",""n_nationkey"":16,""n_name"":""MOZAMBIQUE""}]" "1","AMERICA","""hs use ironic, even requests. s""","[{""n_comment"":""al foxes promise slyly according to the regular accounts. bold requests alon"",""n_nationkey"":1,""n_name"":""ARGENTINA""},{""n_comment"":""y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special "",""n_nationkey"":2,""n_name"":""BRAZIL""},{""n_comment"":""eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold"",""n_nationkey"":3,""n_name"":""CANADA""},{""n_comment"":""platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun"",""n_nationkey"":17,""n_name"":""PERU""},{""n_comment"":""y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be"",""n_nationkey"":24,""n_name"":""UNITED STATES""}]" "2","ASIA","""ges. thinly even pinto beans ca""","[{""n_comment"":""ss excuses cajole slyly across the packages. deposits print aroun"",""n_nationkey"":8,""n_name"":""INDIA""},{""n_comment"":"" slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull"",""n_nationkey"":9,""n_name"":""INDONESIA""},{""n_comment"":""ously. final, express gifts cajole a"",""n_nationkey"":12,""n_name"":""JAPAN""},{""n_comment"":""c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"",""n_nationkey"":18,""n_name"":""CHINA""},{""n_comment"":""hely enticingly express accounts. even, final "",""n_nationkey"":21,""n_name"":""VIETNAM""}]" "3","EUROPE","""ly final courts cajole furiously final excuse""","[{""n_comment"":""refully final requests. regular, ironi"",""n_nationkey"":6,""n_name"":""FRANCE""},{""n_comment"":""l platelets. regular accounts x-ray: unusual, regular acco"",""n_nationkey"":13,""n_name"":""GERMANY""},{""n_comment"":""ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account"",""n_nationkey"":19,""n_name"":""ROMANIA""},{""n_comment"":"" requests against the platelets use never according to the quickly regular pint"",""n_nationkey"":22,""n_name"":""RUSSIA""},{""n_comment"":""eans boost carefully special requests. accounts are. carefull"",""n_nationkey"":23,""n_name"":""UNITED KINGDOM""}]" "4","MIDDLE EAST","""uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl""","[{""n_comment"":""y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d"",""n_nationkey"":17,""n_name"":""EGYPT""},{""n_comment"":""efully alongside of the slyly final dependencies. "",""n_nationkey"":10,""n_name"":""IRAN""},{""n_comment"":""nic deposits boost atop the quickly final requests? quickly regula"",""n_nationkey"":11,""n_name"":""IRAQ""},{""n_comment"":""ic deposits are blithely about the carefully regular pa"",""n_nationkey"":13,""n_name"":""JORDAN""},{""n_comment"":""ts. silent requests haggle. closely express packages sleep across the blithely"",""n_nationkey"":20,""n_name"":""SAUDI ARABIA""}]"
SUPER型データのUNNEST
Redshiftは PartiQL 構文を使ってSUPER型のデータを取り扱うことができます。これにより例えばnations
のテーブルでSUPER型のカラムnation
のデータを展開してフラット化することができますので試してみます。
はじめにnationsのデータをUNNESTしない状態でデータを取得します。
SELECT c.regionkey, c.name, c.comment, c.nations FROM nations c;
regionkey | name | comment | nations |
---|---|---|---|
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | [{"n_comment":" haggle. carefully final deposits detect slyly agai",.... |
1 | AMERICA | "hs use ironic, even requests. s" | [{"n_comment":"al foxes promise slyly according to the regular accounts. bold requests alon","n_nationkey":1,"n_name":"ARGENTINA"},.... |
2 | ASIA | "ges. thinly even pinto beans ca" | [{"n_comment":"ss excuses cajole slyly across the packages. deposits print aroun","n_nationkey":8,"n_name":"INDIA"},.... |
3 | EUROPE | "ly final courts cajole furiously final excuse" | [{"n_comment":"refully final requests. regular, ironi","n_nationkey":6,"n_name":"FRANCE"},.... |
4 | MIDDLE EAST | "uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl" | [{"n_comment":"y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d","n_nationkey":17,"n_name":"EGYPT"},.... |
これに対してnation
カラムをUNNESTしてデータを取得します。
SELECT c.regionkey, c.name, c.comment, n.n_name, n.n_nationkey, n.n_comment FROM nations c, c.nations n;
regionkey | name | comment | n_name | n_nationkey | n_comment |
---|---|---|---|---|---|
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "ALGERIA" | 11 | " haggle. carefully final deposits detect slyly agai" |
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "ETHIOPIA" | 5 | "ven packages wake quickly. regu" |
1 | AMERICA | "hs use ironic, even requests. s" | "ARGENTINA" | 1 | "al foxes promise slyly according to the regular accounts. bold requests alon" |
2 | ASIA | "ges. thinly even pinto beans ca" | "INDIA" | 8 | "ss excuses cajole slyly across the packages. deposits print aroun" |
2 | ASIA | "ges. thinly even pinto beans ca" | "INDONESIA" | 9 | " slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull" |
上記のようにRedshiftのSUPER型のUNNESTはFROM句の拡張で扱うことができます。nationsのデータですと、SUPRE型であるnation
のカラムの配列データを展開してフラットなデータを取得することができています。
またATキーワードを使うことでSUPER型データの配列インデックスも取得することができます。
SELECT c.regionkey, c.name, c.comment, n.n_name, n.n_nationkey, n.n_comment, index n_index FROM nations c, c.nations n AT index;
regionkey | name | comment | n_name | n_nationkey | n_comment | n_index |
---|---|---|---|---|---|---|
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "ALGERIA" | 11 | " haggle. carefully final deposits detect slyly agai" | 0 |
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "ETHIOPIA" | 5 | "ven packages wake quickly. regu" | 1 |
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "KENYA" | 14 | " pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t" | 2 |
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "MOROCCO" | 15 | "rns. blithely bold courts among the closely regular packages use furiously bold platelets?" | 3 |
0 | AFRICA | "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " | "MOZAMBIQUE" | 16 | "s. ironic, unusual asymptotes wake blithely r" | 4 |
1 | AMERICA | "hs use ironic, even requests. s" | "ARGENTINA" | 1 | "al foxes promise slyly according to the regular accounts. bold requests alon" | 0 |
1 | AMERICA | "hs use ironic, even requests. s" | "BRAZIL" | 2 | "y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special " | 1 |
このようにUNNESTを使うことで簡単にSUPER型のデータをフラットにすることができます。JSONのデータを一旦RedshiftのSUPER型で取り込んだ後でSQLで展開して必要なデータにするといった形でELT処理を行えば半構造化データから必要なデータを迅速に取り出すことができるので大変便利です。
大文字フィールドが含まれているSUPER型データの取り扱い
JSONデータのフィールド名が大文字であったり、大文字と小文字が混ざったりしていることはよくあると思います。その際にはRedshiftの識別子の大文字・小文字区別の有無を設定するenable_case_sensitive_identifier
をTRUE
にすることでクエリが可能になります。
SET enable_case_sensitive_identifier to TRUE;
nations.nation
に含まれるデータのn_comment
をN_COMMENT
へ、n_nationkey
をN_Nationkey
へ更新したデータで試してみます。大文字を含むフィールドを指定する場合はフィールド名をダブルクオート"
で囲うことで指定が可能になります。
SELECT c.regionkey, c.name, c.comment, n.n_name, n."N_Nationkey", n."N_COMMENT" FROM nations c, c.nations n;
regionkey | name | comment | n_name | N_Nationkey | N_COMMENT |
---|---|---|---|---|---|
2 | ASIA | ges. thinly even pinto beans ca | "INDIA" | 8 | "ss excuses cajole slyly across the packages. deposits print aroun" |
2 | ASIA | ges. thinly even pinto beans ca | "INDONESIA" | 9 | " slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull" |
2 | ASIA | ges. thinly even pinto beans ca | "JAPAN" | 12 | "ously. final, express gifts cajole a" |
2 | ASIA | ges. thinly even pinto beans ca | "CHINA" | 18 | "c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos" |
まとめ
RedshiftのSUPER型の操作でよく使う操作のUNNESTとフィールドの大文字対応を試してみました。どちらも特に難しい操作はなく簡単に半構造化データを扱うことができました。SUPER型が出るまではRedshiftで直接半構造化データを扱うことができないため事前のETL処理が必要でしたがSUPER型を使うことで非常に簡易に扱えるようになりました。
最後まで読んで頂いてありがとうございました。